Search CORE

603 research outputs found

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

Author: Cai Tony
Liu Weidong
Publication venue
Publication date: 01/01/2011
Field of study

This paper considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix \O and the difference \de of the mean vectors, we introduce a simple and effective classifier by estimating the product \O\de directly through constrained

\ell_1

minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of \O\de and as a consequence allows cases where it can still perform well even when \O and/or \de cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of \O and \de. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Adaptive Thresholding for Sparse Covariance Matrix Estimation

Author: Hawkins D. L.
Tony Cai
Weidong Liu
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we consider estimation of sparse covariance matrices and propose a thresholding procedure which is adaptive to the variability of individual entries. The estimators are fully data driven and enjoy excellent performance both theoretically and numerically. It is shown that the estimators adaptively achieve the optimal rate of convergence over a large class of sparse covariance matrices under the spectral norm. In contrast, the commonly used universal thresholding estimators are shown to be sub-optimal over the same parameter spaces. Support recovery is also discussed. The adaptive thresholding estimators are easy to implement. Numerical performance of the estimators is studied using both simulated and real data. Simulation results show that the adaptive thresholding estimators uniformly outperform the universal thresholding estimators. The method is also illustrated in an analysis on a dataset from a small round blue-cell tumors microarray experiment. A supplement to this paper which contains additional technical proofs is available online.Comment: To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

Crossref

Research Papers in Economics

ScholarlyCommons@Penn

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

Author: Cai T. Tony
Liu Weidong
Publication venue: ScholarlyCommons
Publication date: 01/01/2011
Field of study

This article considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix Ω and the difference δ of the mean vectors, we introduce a simple and effective classifier by estimating the product Ωδ directly through constrained ℓ1 minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of Ωδ and as a consequence allows cases where it can still perform well even when Ω and/or δ cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of Ω and δ. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Large-Scale Multiple Testing of Correlations

Author: Cai T. Tony
Liu Weidong
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity analysis. In this article, we consider large-scale simultaneous testing for correlations in both the one-sample and two-sample settings. New multiple testing procedures are proposed and a bootstrap method is introduced for estimating the proportion of the nulls falsely rejected among all the true nulls. We investigate the properties of the proposed procedures both theoretically and numerically. It is shown that the procedures asymptotically control the overall false discovery rate and false discovery proportion at the nominal level. Simulation results show that the methods perform well numerically in terms of both the size and power of the test and it significantly outperforms two alternative methods. The two-sample procedure is also illustrated by an analysis of a prostate cancer dataset for the detection of changes in coexpression patterns between gene expression levels. Supplementary materials for this article are available online

PubMed Central

ScholarlyCommons@Penn

FigShare

A Constrained L1 Minimization Approach to Sparse Precision Matrix Estimation

Author: Banerjee O.
Liu H.
Tony Cai
Weidong Liu
Xi Luo
Yuan M.
Publication venue
Publication date: 01/01/2011
Field of study

A constrained L1 minimization method is proposed for estimating a sparse inverse covariance matrix based on a sample of

n

iid

p

-variate random variables. The resulting estimator is shown to enjoy a number of desirable properties. In particular, it is shown that the rate of convergence between the estimator and the true

s

-sparse precision matrix under the spectral norm is

s\sqrt{\log p/n}

when the population distribution has either exponential-type tails or polynomial-type tails. Convergence rates under the elementwise

L_{\infty}

norm and Frobenius norm are also presented. In addition, graphical model selection is considered. The procedure is easily implementable by linear programming. Numerical performance of the estimator is investigated using both simulated and real data. In particular, the procedure is applied to analyze a breast cancer dataset. The procedure performs favorably in comparison to existing methods.Comment: To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

ScholarlyCommons@Penn

Two-Sample Covariance Matrix Testing and Support Recovery

Author: Cai Tony
Liu Weidong
Xia Yin
Publication venue: ScholarlyCommons
Publication date: 01/01/2013
Field of study

This paper proposes a new test for testing the equality of two covariance matrices Σ1 and Σ2 in the high-dimensional setting and investigates its theoretical and numerical properties. The limiting null distribution of the test statistic is derived. The test is shown to enjoy certain optimality and to be especially powerful against sparse alternatives. The simulation results show that the test significantly outperforms the existing methods both in terms of size and power. Analysis of prostate cancer datasets is carried out to demonstrate the application of the testing procedures. When the null hypothesis of equal covariance matrices is rejected, it is often of significant interest to further investigate in which way they differ. Motivated by applications in genomics, we also consider two related problems, recovering the support of Σ1 − Σ2 and testing the equality of the two covariance matrices row by row. New testing procedures are introduced and their properties are studied. Applications to gene selection is also discussed

ScholarlyCommons@Penn

Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

Author: Bickel
Borjabad
Breiman
Cai
Cai
Cai
Cai
Dickstein
d’Aspremont
Fan
Fan
Friedman
Friedman
Lam
Lauritzen
Liu
Meinshausen
Ravikumar
Rothman
Sun
Tibshirani
Weidong Liu
Xi Luo
Yuan
Yuan
Publication venue: 'Elsevier BV'
Publication date: 22/12/2016
Field of study

This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset.Comment: Maintext: 24 pages. Supplement: 13 pages. R package scio implementing the proposed method is available on CRAN at https://cran.r-project.org/package=scio . Published in J of Multivariate Analysis at http://www.sciencedirect.com/science/article/pii/S0047259X1400260

arXiv.org e-Print Archive

Crossref